Load XML file into Hive Table using xpath

Smaple input File
<employee>
<id>1</id>
<name>Satish Kumar</name>
<designation>Technical Lead</designation>
</employee>
<employee>
<id>2</id>
<name>Ramya</name>
<designation>Testing</designation>
</employee>


Step-1 Bring each record in one line
cat employees.xml | tr -d '&' | tr '\n' ' ' | tr '\r' ' ' | sed 's|</employee>|</employee>\n|g' | grep -v '^\s*$' > employees_records.xml

cat employees_records.xml

Step-2 Load the file in HDFS
hadoop fs -mkdir /user/hive/sample-xml-inputs
hadoop fs -put employees_records.xml /user/hive/sample-xml-inputs
hadoop fs -cat /user/hive/sample-xml-inputs/employees_records.xml

Step-3 Create a Hive table and point xml file
create external table xml_table_org( xmldata string) LOCATION '/user/hive/sample-xml-inputs/';
select * from xml_table_org;

Step-4 Load data form stg table to main Table
CREATE TABLE xml_table AS SELECT xpath_int(xmldata,'employee/id'),xpath_string(xmldata,'employee/name'),xpath_string(xmldata,'employee/designation') FROM xml_table_org;
select * from xml_table;

No comments:

Post a Comment